System and Method for Detecting Generalized Space-Time Clusters

ABSTRACT

A system for detecting clusters in space and time using input data on occurrences of a phenomenon and characteristics at a plurality of locations and times comprises an expectation generation module determining expected occurrences of a phenomena, and an occurrence modeling module determining actual occurrences of the phenomena. The system further comprises a search module searching the expected occurrences and the actual occurrences for a plurality of candidate solutions, wherein each solution is represented as a set of points in the three-dimensional space, and wherein each point corresponds to a location at a time. The system comprises a convex container module determining at least one solution corresponding to a selected convex container shape from the plurality of candidate solutions, and a solution evaluation module determining a strength metric for each solution determined by the convex container module, the search module selecting a dominant cluster in the input data.

CROSS-REFERENCE TO RELATED APPLICATION

This is a Continuation Application of U.S. application Ser. No.11/777,548 filed on Feb. 12, 2004, the disclosure of which is hereinincorporated by reference in its entirety.

GOVERNMENT LICENSE RIGHTS

This invention was developed under Government Contract (DARPA ProjectF30602-01-C-0184). The U.S. Government has rights to this patentapplication.

BACKGROUND OF THE INVENTION

1. Field of Invention

The present invention relates to modeling space-time data, and moreparticularly to detecting three-dimensional convex clusters in space andtime for a phenomenon.

2. Discussion of Related Art

Detection of clusters in space and time, called space-time clusters, isan important function in various domains. For example, detection of suchclusters is an important part of the investigation of disease outbreaksin the domain of epidemiology and public health. Other domains ofapplication include medical imaging, urban planning and reconnaissance.The notion of what constitutes a cluster depends on the domain. Forexample, the spatial scan statistic as described in the paper “A spatialscan statistic” by Martin Kulldorff in Communications in Statistics;Theory and Methods, Volume 26, Number 6, 1997, is widely used in theepidemiology and public health domain. Other models of clustering mightbe appropriate in other domains.

Methods for detecting clusters may be developed depending on theclustering notion used. For example, the use of the scan statisticimplies that earlier hierarchical approaches (see for example,“Automatic subspace clustering of high dimensional data for data miningapplications” by R. Agrawal, J. Gehrke, D. Gunopuios, and P. Raghavan inProceedings of the ACM-SIGMOD International Conference on Management ofData, 1998) to clustering cannot be applied. An example of a system thatmay handle the spatial scan statistic model for clustering is theSaTScan system. SaTScan may be used to detect space-time clusters with acylindrical shape, representing a circular region in space for theentire duration of an interval in time.

SUMMARY OF THE INVENTION

The cylindrical shape may not represent clusters that shrink or growwith time. Also, it may not represent movement of the phenomenon overtime. The exhaustive search based on a grid that is utilised by systemslike SaTScan may not be extended to more general shapes. Therefore, aneed exists for a system that at least may detect generalized space-timeclusters that model such characteristics of the underlying phenomenon.

A system for detecting clusters in space and time using input data onoccurrences of a phenomenon and characteristics at a plurality oflocations and times includes an expectation generation module, anoccurrence modeling module, a search module, a convex container module,and a solution evaluation module.

The expectation generation module determines expected occurrences of aphenomena at a plurality of locations and a plurality of times, and theoccurrence modeling module determines actual, occurrences of thephenomena at a plurality of locations and a plurality of times. Thesearch module searches the expected occurrences and the actualoccurrences for a plurality of candidate solutions. Each solution isrepresented as a set of points in the three-dimensional space, whereineach point corresponds to a location at a time. The convex containermodule determines at least one solution corresponding to a selectedconvex container shape from the plurality of candidate solutions, andthe solution evaluation module determines a strength metric for eachsolution determined by the convex container module, the search moduleselecting a solution having a desirable strength, wherein the solutionhaving the desirable strength indicates a dominant cluster in the inputdata.

The search module selects a strongest solution as determined by thesolution evaluation module.

A cache module may be included to save the solutions having the desiredshape determined by the convex container module for previously examinedsets of points.

The input data on occurrences of a phenomenon include counts and timesof the occurrences of the phenomenon at the locations in a time period.The input data on characteristics of the locations and times include thepopulations subject to the occurrences of the phenomenon at thelocations and times.

The expectation generation model generates expected counts ofoccurrences at the locations and times using a Poisson model.

The occurrence modeling module determines the occurrences as equal tothe occurrences in the input data. The occurrence modeling moduledetermines the occurrences at the locations and times based on theircharacteristics and a domain dependent model. The occurrences aredetermined from the population using a Poisson model.

The search module considers candidate solutions represented as sets ofpoints and utilizes the convex container module to determine solutionshaving the desired shape from the candidate solutions. The candidatesolutions are initialized based on the input data. Each initialcandidate solution is singleton point.

The search module determines candidate solutions from solutionsconsidered using an iterative process. Candidate solutions are createdfrom solutions considered based on the chosen convex container shape.

The convex container module may determine a solution in various ways.For example, the convex container module determines a solution withminimum volume, given the selected convex container shape, that includesail the points in a given candidate solution. The convex containermodule may determine a solution with maximum volume, given the selectedconvex container shape, that excludes all the points not in the givencandidate solution. According to another example, the convex containermodule may determine a solution that maximizes a measure representingthe equality between the set of points in the given candidate solutionand the set of points included in the solution, given the selectedconvex container shape.

The solution evaluation module determines the strength metric based onall the points included in the solution and the expected occurrencesdetermined by the expectation generation module and the occurrencesdetermined by the occurrence modeling module.

The strength metric is based on the likelihood ratio using the spatialscan statistic.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the present invention will be described belowin more detail, with references to the accompanying drawings:

FIG. 1 is an illustration of a system according to an embodiment of thepresent disclosure;

FIG. 2 is an illustration of a system according to an embodiment of thepresent disclosure;

FIGS. 3A and 3B are illustrations of a cluster with a truncated squarepyramid shape according to an embodiment of the present disclosure wherethe three dimensional view is given in FIG. 3A and the two dimensionalprojection from the top on to the spatial plane is given in FIG. 3B;

FIG. 4 is an illustration of a cluster with a cylindrical shape;

FIGS. 5A and 5B are illustrations of a cluster with a truncated coneshape according to an embodiment of the present disclosure where thethree dimensional view is given in FIG. 5A and the two dimensionalprojection from the top on to the spatial plane is given in FIG. 5B;

FIG. 6 shows results according to an embodiment of the presentdisclosure for cancer data using a square pyramid as a selected convexcontainer shape; and

FIG. 7 is a flow chart of a method according to an embodiment of thepresent disclosure.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The detection of three-dimensional clusters in space and time is usedfor determining localized occurrences of a phenomenon of interest. Datacomprising space and time information is analyzed to determine where andwhen the phenomenon of interest appears. A system and/or method analyzethe input data to determine a phenomenon of interest and use a userselected three dimensional shape to determine properties of thephenomenon.

It is to be understood that the present invention may be implemented invarious forms of hardware, software, firmware, special purposeprocessors, or a combination thereof. In one embodiment, the presentinvention may be implemented in software as an application programtangibly embodied on a program storage device. The application programmay be uploaded to, and executed by, a machine comprising any suitablearchitecture.

Referring to FIG. 1, according to an embodiment of the presentdisclosure, a computer system 101 for implementing the present inventionmay comprise, inter alia, a central processing unit (CPU) 102, a memory103, and an input/output (I/O) interface 104. The computer system 101 isgenerally coupled through the I/O interface 104 to a display 105 andvarious input devices 106 such as a mouse and keyboard. The supportcircuits may include circuits such as cache, power supplies, clockcircuits, and a communications bus. The memory 103 may include randomaccess memory (RAM), read only memory (ROM), disk drive, tape drive,etc., or a combination thereof. The present invention may foeimplemented as a routine 107 that is stored in memory 103 and executedby the CPU 102 to process the signal from the signal source 108. Assuch, the computer system 101 is a general-purpose computer system thatbecomes a specific purpose computer system when executing the routine107 of the present invention.

The computer platform 101 also includes an operating system andmicroinstruction code. The various processes and functions describedherein may either be part of the microinstruction code or part of theapplication program (or a combination thereof), which is executed viathe operating system. In addition, various other peripheral devices maybe connected to the computer platform such as an additional data storagedevice and a printing device.

It is to be further understood that, because some of the constituentsystem components and method steps depicted in the accompanying figuresmay be implemented in software, the actual connections between thesystem components (or the process steps) may differ depending upon themanner in which the present invention is programmed. Given the teachingsof the present disclosure provided herein, one of ordinary skill in therelated art will be able to contemplate these and similarimplementations or configurations of the present invention.

Referring to FIG. 2, a system and method for detecting space-timeclusters is depicted. The clusters detected by this system may modelgrowth or shrinkage over time of the phenomenon of interest. Theclusters may also model movement of the phenomenon of interest withtime. The input data 201 comprises information on the occurrences of thephenomenon being analyzed by locations and times 202, and thecharacteristics of these locations and times 203. The system isapplicable in various domains. The information on the occurrences of thephenomenon 202 is the counts of the occurrences for each combination oflocation and time for the region, and time period of interest. Forexample, in the public health domain, the information on the occurrencesof the phenomenon 202 may comprise counts of occurrences of a particulardisease for each county and month/year. The characteristics of theselocations and times 203 comprises population counts that are subject tothe phenomenon at each combination of location and time. For example,the characteristics of these locations and times 203 may comprise thepopulation of the counties being considered at various times in theperiod of interest along with other relevant demographic information onthe population (e.g., age, gender, etc.). A selected convex containershape for the detected clusters determines the properties of thephenomenon that may be modeled.

Referring to FIG. 2, the expectation generation module 204 determinesthe expected occurrences at the locations and times using the inputdata. The expected values are determined using the input characteristicsof locations and times 203 and a domain dependent model for the expectedbehavior in the absence of any clustering. For example, in theepidemiology domain the expected counts for the occurrences of a diseasemay be modeled in the absence of any clustering using the Poissondistribution based on the population. Consider the analysis of theoccurrences of a disease in a set of counties over an interval of atime. The expected disease counts for any county in any year may bedetermined as being proportional to the population of the county duringthat year. Consider the example where the total number of disease casesin 10 counties over 10 years is 1000 and the total population is 100000over all the 10 counties in each of the 10 years. The expected number ofcases in a county with a population of 20000 in one year is 20. Othercharacteristics of the population may also be available. Demographics ofthe population like age and gender may be factored into the expectationgeneration by well-known statistical methods like indirectstandardization.

Referring to FIG. 2, the occurrence modeling module 205 determines theoccurrences at each location and time. The occurrence modeling module205 operates in different modes. A first mode is used to determine astrongest cluster in the input data of occurrences, where the strengthof the cluster is determined by the solution evaluation module 208. Inthe first mode, the occurrence modeling module 205 uses the occurrencesof the phenomenon 202 of the input data for analysis. Occurrence countsfor the phenomenon are determined by location and time. A second mode isused to determine the strongest cluster of occurrences in a simulatedenvironment. The model used as the basis for the expectation generationmodule 204 (e.g., a random sampling using a Poisson distribution) isused to determine the occurrences in the simulated environment. In thesecond mode, the occurrence modeling module 205 determines theoccurrences based on the characteristics of the locations and timesprovided in the characteristics of these locations and times 203 of theinput data using the model for expected occurrences in the absence ofany clustering. For example, the occurrences are determined by randomsampling from a Poisson distribution where the expected counts areproportional to the population information in the characteristics ofthese locations and times 203 of the input data. Other information, suchas demographics in the characteristics of these locations and times 203of the input data may be factored in using modeling methods including,for example, indirect standardization. In the second mode the clustersin the random data of occurrences are determined. Experiments with theserandom data are used to determine the likelihood of finding clusters bychance that are comparable in strength to the strongest cluster detectedin the actual data of occurrences. The cluster detected in the actualdata of occurrences is significant if such strong clusters are unlikelyto exist in the random datasets,

Referring to FIG. 2, solutions in the search module 206 are representedtoy a set of points, where each point corresponds to a specific locationat a specific time. The search module 206 calls the convex containermodule 207, wherein the convex container module 207 generates a solutioncorresponding to a user selected convex container shape from any givencandidate solution containing a set of points. A solution of the convexcontainer module 207 exists if a container with the selected convexcontainer shape may he determined that includes exactly the points inthe solution. The convex container module 207 makes a list of convexcontainer shapes available to the user. The user selects from this listthe convex container shape that models the phenomenon of interest mostclosely as determined by the user.

A solution of the convex container module 207 may be generated in any ofa plurality of methods according to an embodiment of the presentinvention. For example, the convex container module generates a solutionwith minimum volume that includes all the points in the given set.According to another example, the convex container module generates asolution with maximum volume that excludes all the points not in thegiven set. According to yet another example, the convex container modulegenerates a solution that is closest to the given set of points using ameasure of equality between the given set of points and the set ofpoints included in the solution. A measure of equality between two setsis the number of items that are only in one of the sets and minimizingthis number would make the two sets closer to each other.

A cache module may be added to the cluster detection system forefficiency purposes. The cache module stores the solutions generated bythe convex container module 207 for previously encountered sets ofpoints. If a set of points is encountered more than once, the solutionfor it may be gotten from the cache module instead of invoking theconvex container module again.

Convex container shapes may take any of a plurality of shapes. Forexample, a truncated pyramid with a square cross-section, where thesquare cross-section at any time in an interval represents the spatialextent of the cluster at the time. An example of a three-dimensionalsquare pyramid cluster is shown in FIGS. 3A and 3B. FIG. 3A shows thethree-dimensional view and FIG. 3B shows the boundaries of the squarepyramid cluster at discrete times in the projection from the top on tothe spatial plane. Contrast the flexibility of the square pyramidcluster of FIG. 3A with the cylindrical shape in the prior art shown inFIG. 4. The square pyramid shape of FIG. 3A may model growth orshrinkage over time. Since the pyramid's axis along the time dimensionis not restricted to be orthogonal to the two spatial axes it may alsomodel movement over time.

Such a square pyramid may be specified by eight parameters as follows.Referring to FIG. 3A the first two parameters are the earliest time 301and latest time 302 in the pyramid interval. Referring to FIG. 3B thenext two parameters, A and B, represent the two spatial coordinates ofthe anchor vertex of the square cross-section 303 at the earliest timein the pyramid interval. The vertex of the cross-section with thesmallest values for the spatial coordinates is chosen as the anchorvertex. Referring to FIG. 3B, the parameter G is the side of the squarecross-section 303 at the earliest time in the pyramid interval.Similarly, the parameters C and D represent the spatial coordinates ofthe anchor vertex and the parameter H is the side of the squarecross-section 304 at the latest time in the pyramid interval. The volumeof the square pyramid is proportional to (G*G+G*H+H*H).

Accordingly, a solution of the convex container module 207 may begenerated from a given set of points by solving a quadratic programmingproblem in which the volume as specified above using the parameters Gand H of the square pyramid is minimized. Each point in the given setleads to four linear constraints in this quadratic programmingformulation. These four constraints specify that the point is containedin the cross-section of the square pyramid at the specific timecorresponding to the point. The cross-section of the square pyramid maybe determined from the six parameters. A,B,C,D,G and H, using linearinterpolation. The quadratic programming problem may be solved using anyof a plurality of methods. For example, a quadratic programming problemwith linear constraints may be solved exactly using any quadraticprogramming library package. According to another example, the quadraticprogramming problem may be solved approximately by considering a set ofsquare pyramids that contain ail the points in the given set andchoosing the one with minimum volume. The square pyramid shape is oneexample of the convex container shapes that may be selected by the userfor the convex container module 207.

Another example of a convex container shape is a truncated pyramid witha regular polygon for its cross-section. The generation of a solution bythe convex container module 207 from a given set of points for thistruncated pyramid container shape is an extension of the determinationdescribed above for the square pyramid. The volume of this pyramid is asimilar function of the sides of the regular polygon at the minimum andmaximum times of the pyramid interval. Each point in the given set leadsto one linear constraint for each side of the polygon. The quadraticprogramming problem may be solved as described above for the squarepyramid case. The truncated pyramid with a regular polygon is anotherexample of the convex container shapes that may be selected by the user.

Another convex container shape that may be selected is a truncated conewith a circular spatial cross-section for each time in an interval. Anexample of this truncated cone is shown in FIGS. 5A and 5B. FIG. 5Ashows the three-dimensional view and FIG. 5B shows the projection of thecones outline at discrete points in time on the spatial plane. Thecenters of the circular cross-sections of the cone are also marked (*)in FIG. 5B. A truncated cone is specified by eight parameters. Referringto FIG. 5A, the first two parameters 501 and 502 are the earliest andlatest time for the cone's time interval. Referring to FIG. 5B, the nexttwo parameters 503 and 504 are the spatial coordinates of the center ofthe circular cross-section at the earliest time in the cone's interval.Parameters 505 and 506 are the spatial coordinates of the center of thecircular cross-section at the latest time in the cone's interval.Parameter 507 and 503 are the radii of the cross-sections at theearliest and latest times in the cone's interval, respectively. Asolution of the convex container module 207 may be generated from thegiven set of points for this convex container shape using a non-linearoptimization formulation. This formulation minimizes the volume that isa quadratic function of the two parameters representing the radii of thecross-sections at the earliest and latest times in the cone's interval.Each point in the given set leads to a non-linear constraint thatspecifies that it is contained within the cross-section of the cone atthe specific time corresponding to the point. The constraint is easilyderived by simple geometric equations for the circle once the center andthe radius of the cross-section at the time corresponding to the pointis determined using linear interpolation. This non-linear optimizationis solved by approximate methods that consider only a subset of conescontaining all the points in the given set and selecting the one withthe smallest volume. The truncated cone is one of the convex containershapes that may be selected by the user.

One having ordinary skill in the art would recognise, in light of thepresent disclosure, that other convex containers may be used forclustering.

Referring to FIG. 2, the search module 206 searches for a strongestsolution for the occurrences and expectations from modules 204 and 205respectively, and the chosen convex container shape. The strength of asolution of the convex container module 207 is determined by thesolution evaluation module 206. The search module 206 considerscandidate solutions that are represented as sets of points and utilizesthe convex container module 207 to generate solutions from thesecandidate solutions. The search module 206 creates candidate solutionsin an iterative process from, solutions determined by the convexcontainer module 207 considered earlier in the iterative process. Thisiterative process may be initialized to a set of candidate solutionscontaining single points that have non-zero occurrences. The searchmodule 206 stores a set of solutions determined by the convex containermodule 207 considered earlier in the iterative process with preferenceto storing the stronger solutions. Candidate solutions are created from,a pair of stored solutions (P, Q) by first splitting the solutions P andQ using a random three-dimensional hyperplane and combining the piecesto create new sets of points as candidates. P and Q are chosen in abiased random fashion giving preference to the stronger solutions in thestored set of solutions. Candidate solutions are also created from asingle stored solution R by modifying it based on the characteristics ofthe convex container shape corresponding to R. One or more points not inR but close to the convex container boundaries of the solution R areadded to R to create a candidate solution. Similarly, one or more pointsin R and close to the convex container boundaries of the solution R areremoved from R to create a candidate solution. The occurrenceinformation at locations and times may be used in the selection ofpoints for addition or removal. The strongest solution determined by thesolution evaluation module 208 among those solutions determined by theconvex container module 207 found in the search is output as a detectedcluster.

A heuristic model may be used for determining sets of points likely tocontain a strongest solution fitting a convex container shape.Therefore, the search module may consider a subset of all possible setsof points while having a measure of confidence that a strongest solutionfitting a convex container shape will be determined.

The search module 206 may select candidate solutions using othermethods, for example, by systematically considering all candidatesolutions based on a repetitive process such as a grid. This does notimply that ail solutions will be explored for the chosen convexcontainer shape. This approach may be effective for some simple shapes.

Referring to FIG. 2, the solution evaluation module 208 determines astrength for any given solution of the convex container module 207. Thestrength measure may be customized based on the application domain. Thestrength measure is based on the points included in the solution and onthe occurrences and expectations for the locations and times. Oneexample of a strength measure is the likelihood ratio based on thespatial scan statistic as described in the paper “A spatial scanstatistic” by Martin Kulldorff in Communications in Statistics: Theoryand Methods, Volume 26, Number 6, 1997. Another example of a strengthmeasure is simply the number of occurrences at the points included inthe solution. Consider the case where input data includes occurrences ofa phenomenon at various locations and time and also the correspondingpopulation sizes. Another example of a strength measure is theoccurrence density as defined by the total number of occurrences at thepoints included in the solution divided by the total populationcorresponding to the points included in the solution. The density couldalso be modified to take into account the spatial extent of the solutionin addition to the population. Other strength measures may also be usedcustomized to the domain of application.

An implementation according to an embodiment of the present inventionwas applied to data on cancer occurrences in 32 counties over a 19 yearperiod.

occurrence.txt Grant 1 1997 2 2 SanJuan 1 1974 8 2 Bernalillo 1 1977 131 DonaAna 1 1977 14 2 Union 1 1977 16 2 Sandoval 1 1977 11 1 Valencia 11977 17 1 DonaAna 1 1977 11 2 Valencia 1 1977 7 2 Bernalillo 1 1975 13 1The file occurrence.txt contains a portion of the input occurrence data.Each line of this file specifies the county, number of occurrences ofcancer, the year of occurrence and values for the other characteristicslike age group and gender of the cancer patients.

population.txt Bernalillo 73 15537 1 1 Bernalillo 73 14931 1 2 Catron 7399 1 1 Catron 73 99 1 2 Chaves 73 1888 1 1 Chaves 73 1867 1 2 Colfax 73567 1 1 Colfax 73 526 1 1 Curry 73 2192 1 1 Curry 73 2130 1 2

location.txt Bernalillo 66 102 Catron 8 57 Chaves 126 47 Colfax 123 162Curry 161 79 DeBaca 132 82 DonaAna 64 11 Eddy 136 13 Grant 22 25Guadelupe 129 97The files location.txt and population.txt contain a portion of thecharacteristics of the locations and times. The file location.txtcontains the Cartesian coordinates defining the location of each countybeing analyzed. Each line of the file population.txt contains a countyname, year, and population for each set of values of the othercharacteristics (e.g., age group and gender). The cluster detected withthe square pyramid as chosen convex container shape is shown in FIG. 6.This cluster extends from year 1982 to the year 1989. There are 292cases included in this cluster, which is significantly larger than the211.57 expected if there were no clusters using the Poisson model andindirect standardization. The results in FIG. 6 demonstrate how thesystem and method are able to capture characteristics like growth andmovement in the phenomenon (e.g., cancer occurrences) that would not bepossible with the prior art.

Referring to FIG. 7, a method according to an embodiment of the presentdisclosure takes as input data on occurrences of a phenomenon atlocations and times and data on characteristics of the locations andtimes 701. The occurrences of a phenomenon and characteristics oflocations and times are given for points in space and time. Actualoccurrences of the phenomenon are determined at the points according tothe input data 702. Expected occurrences are determined in the absenceof any clustering of the phenomenon at the points according to thecharacteristics of the locations and times using a domain dependentmodel for the phenomenon 703. A convex container shape is selected fordetermining a cluster in the input data 704. The convex container shapemay be selected at any time before searching for solutions. Solutionsrepresented as a set of points are sought that conform to the selectedconvex container shape for a cluster with a highest strength metric thatis based on the number of occurrences included in the cluster and theinput data 705.

Having described embodiments for system for detecting generalizedspace-time clusters, it is noted that modifications and variations canbe made by persons skilled in the art in light of the above teachings.It is therefore to be understood that changes may be made in theparticular embodiments of the invention disclosed which are within thescope and spirit of the invention as defined by the appended claims.Having thus described the invention with the details and particularityrequired by the patent laws, what is claimed and desired protected byLetters Patent is set forth in the appended claims.

1. A method for detecting clusters comprising: receiving input data onoccurrences of a phenomenon at locations and times and data oncharacteristics of the locations and times; determining actualoccurrences of the phenomenon at the three-dimensional space-time pointsaccording to the input data; determining expected occurrences in theabsence of any clustering of the phenomenon at the points according tothe characteristics of the occurrences using a domain dependent modelfor the phenomenon; selecting a convex container shape for determining acluster in the input data; and determining a solution represented as aset of points that conforms to the selected convex container shape for acluster with a desirable strength.
 2. The method of claim 1, furthercomprising caching solutions conforming to the selected convex containershape.
 3. The method of claim 1, wherein determining expectedoccurrences further comprises generating expected counts of occurrencesat the locations ana times using a model.
 4. The method of claim 1,further comprising determining a strength of the cluster based on pointsincluded in the cluster, and the expected occurrences and the actualoccurrences.
 5. A program storage device is provided readable bymachine, tangibly embodying a program of instructions automaticallyexecutable by the machine to perform method steps for determining aclusters, the method steps comprising: receiving input data onoccurrences of a phenomenon at locations and times and data oncharacteristics of the locations and times; determining actualoccurrences of the phenomenon at the three-dimensional space-time pointsaccording to the input data; determining expected occurrences in theabsence of any clustering of the phenomenon at the points according tothe characteristics of the occurrences using a domain dependent modelfor the phenomenon; selecting a convex container shape for determining acluster in the input data; and determining a solution represented as aset of points that conforms to the selected convex container shape for acluster with a desirable strength.
 6. The method of claim 5, furthercomprising caching solutions conforming to the selected convex containershape.
 7. The method of claim 5, wherein determining expectedoccurrences further comprises generating expected counts of occurrencesat the locations and times using a model.
 8. The method of claim 5,further comprising determining a strength of the cluster based on pointsincluded in the cluster, and the expected occurrences and the actualoccurrences.